gh-104212: Add importlib.util.load_source_path() function #105755

vstinner · 2023-06-13T22:55:18Z

Issue: Improve the docs regarding the migration from imp to importlib #104212

📚 Documentation preview 📚: https://cpython-previews--105755.org.readthedocs.build/

vstinner · 2023-06-13T23:00:31Z

imp.load_source() has a 3rd parameter file to pass a file-like object, whereas my proposed function has only 2 arguments. Would it be worth it to implement this parameter?

imp.load_source() uses this special loader to implement this parameter:

class _HackedGetData:

    """Compatibility support for 'file' arguments of various load_*()
    functions."""

    def __init__(self, fullname, path, file=None):
        super().__init__(fullname, path)
        self.file = file

    def get_data(self, path):
        """Gross hack to contort loader to deal w/ load_*()'s bad API."""
        if self.file and path == self.path:
            # The contract of get_data() requires us to return bytes. Reopen the
            # file in binary mode if needed.
            if not self.file.closed:
                file = self.file
                if 'b' not in file.mode:
                    file.close()
            if self.file.closed:
                self.file = file = open(self.path, 'rb')

            with file:
                return file.read()
        else:
            return super().get_data(path)


class _LoadSourceCompatibility(_HackedGetData, machinery.SourceFileLoader):

    """Compatibility support for implementing load_source()."""

vstinner · 2023-06-13T23:01:31Z

The function was already proposed in 2012 but nobody actually implemented it: #58756 Maybe less people cared since imp.load_source() was still available :-) But the imp module got removed in Python 3.12!

It was recently requested when someone tried to port their code to Python 3.12: #104212

vstinner · 2023-06-14T15:04:41Z

@warsaw @brettcannon: What do you think of this late Python 3.12 addition? I propose to add it after beta1 since I consider that it's a regression, not a new feature: it replaces removed imp.load_source().

I thought that imp.load_source() supported file-like objects which are not on disk, like ZIP archives or anything else. But the imp implementation can reopen the file from its filename, so I'm not really convinced by its usefulness.

brettcannon · 2023-06-14T21:16:32Z

It's a new feature so I wouldn't want it slipped into Python 3.12 (just because imp had it doesn't mean it was a good idea 😉). Plus the code required to do this is already documented at https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly and it's 4 lines.

vstinner · 2023-06-15T08:23:23Z

It's a new feature so I wouldn't want it slipped into Python 3.12 (just because imp had it doesn't mean it was a good idea wink).

Well, there are projects who need this function: see #104212.

Plus the code required to do this is already documented at https://docs.python.org/3/library/importlib.html#importing-a-source-file-directly and it's 4 lines.

This recipe doesn't work if the filename doesn't end with .py according to #104212.

A recipe in the doc is nice, but people (like me) are unable to find it and so come up with their own bad implementation: https://stackoverflow.com/questions/69884878/replacing-imp-with-importlib-maintaining-old-behavior

Even if the code is short, the code already evolved once: before, load_module() had to be used, and now this method is gone and exec_module() must be used instead.

I'm not going to push too hard to have such function in importlib: I have my own flavor of this function in my project. I already updated it a few time to follow new importlib "trends" and follow deprecations :-) (The first implementation used imp obviously.)

brettcannon · 2023-06-16T00:06:43Z

This recipe doesn't work if the filename doesn't end with .py according to #104212.

And they are incorrect. 😉 That's only an issue if you don't specify the loader does it fail.

brettcannon

I think you need to decide whether this is meant to only import a module and be as simple as possible, or can you also import a package?

Regardless, it's missing a module spec on the module which is critical.

It also needs to be decided on whether this is necessary.

brettcannon · 2023-06-16T00:07:36Z

Lib/importlib/util.py

@@ -246,3 +247,13 @@ def exec_module(self, module):
        loader_state['__class__'] = module.__class__
        module.__spec__.loader_state = loader_state
        module.__class__ = _LazyModule
+
+
+def load_source(module_name, filename):


The name is misleading because you're not loading from source, but from a file that contains source code. I think you're after something like load_source_path().

You're also missing whether this is a package? If that's not possible for simplicity, then that should probably be stated. Otherwise it's a piece of information that a module spec contains.

brettcannon · 2023-06-16T00:08:02Z

Lib/importlib/util.py

+def load_source(module_name, filename):
+    """Load a module from a filename."""
+    loader = SourceFileLoader(module_name, filename)
+    module = types.ModuleType(loader.name)


You should get the module via the spec.

brettcannon · 2023-06-16T00:09:20Z

Lib/importlib/util.py

+    """Load a module from a filename."""
+    loader = SourceFileLoader(module_name, filename)
+    module = types.ModuleType(loader.name)
+    module.__file__ = filename


This should be set by the loader, not manually. And you're missing the spec.

bedevere-bot · 2023-06-16T00:14:42Z

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

brettcannon · 2023-06-16T00:15:55Z

I don't know how anyone else feels, but we can see if there's any other responses.

One key thing, though, is the current implementation in the PR is incomplete as it's missing a spec. It also has to decided whether supporting packages is important or not.

vstinner · 2023-06-16T09:24:37Z

As an user, I have no idea of what a spec is. I have no idea how to use the importlib API correctly. So I copy/paste random recipes on the Internet and hack them for my needs. That's why I like the idea of a clean API to use the importlib API to "load a Python source and import it".

I don't want to load a package. For me, the most common case would be that it's a .py file. I don't see how i would load a package from a filename, since it's a directory. Maybe we need a test to ensure that it doesn't work.

vstinner · 2023-06-16T10:45:12Z

I updated my PR:

PR rebased on main
function renamed to load_source_path()
use a spec
define the behavior (in doc and with tests) for packages: loading a directory raises an error, loading an __init__.py file works as expected (it's a regular file at the end)

New implementation:

def load_source_path(module_name, filename):
    """Load a module from a filename."""
    loader = SourceFileLoader(module_name, filename)
    spec = spec_from_loader(module_name, loader)
    module = module_from_spec(spec)
    sys.modules[module.__name__] = module
    loader.exec_module(module)
    return module

vstinner · 2023-06-16T10:46:16Z

Another imp.load_source() user who doesn't know how to update the code to Python 3.12 importlib API: https://discuss.python.org/t/how-do-i-migrate-from-imp/27885

Lib/importlib/util.py

brettcannon · 2023-06-16T22:09:20Z

As an user, I have no idea of what a spec is.

https://docs.python.org/3/library/importlib.html#importlib.machinery.ModuleSpec 😉

Joking aside, it is documented.

I don't want to load a package.

Then you probably should make sure the name doesn't have a . in it, otherwise you're also not making sure any submodules or subpackages end up on the parent.

Another imp.load_source() user who doesn't know how to update the code to Python 3.12 importlib API: https://discuss.python.org/t/how-do-i-migrate-from-imp/27885

As I said over on discuss.python.org, Miro's ask is actually different from yours. You want to load from a specific path, Miro wants to search and load from a specific directory (note how your use case lacks a search component).

arhadthedev · 2023-06-17T04:12:30Z

Doc/library/importlib.rst

+   A package cannot be imported by its directory path, whereas its
+   ``__init__.py`` file (ex: ``package/__init__.py``) can be imported.
+
+   .. versionadded:: 3.12


The PR has no needs backport to 3.12 label. Should the version be 3.13 instead?

Suggested change

.. versionadded:: 3.12

.. versionadded:: 3.13

Also, it seems there is no support for the backport to satisfy a specific use case:

As I said over on discuss.python.org, Miro's ask is actually different from yours. You want to load from a specific path, Miro wants to search and load from a specific directory (note how your use case lacks a search component).

vstinner · 2023-06-19T07:27:40Z

The PR has no needs backport to 3.12 label. Should the version be 3.13 instead?

I propose to add this function to Python 3.12 to replace the removed imp.load_source(). If it's too late, it can be added to Python 3.13. But it's unpleasant that Python 3.12 users have no solution (in form of a single function call) in the stdlib: otherwise, people will copy/paste these 3-5 lines to their code, and adding a function to Python 3.13 stdlib becomes less relevant?

vstinner · 2023-06-19T08:15:53Z

I updated my PR:

Rebased on main
Squashed commits
Test that the module is always executed, even if it's already cached in sys.modules. Add a "side effect" for that.
Reimplement the function with spec_from_file_location(), instead of spec_from_spec(), to make sure that the module.__file__ attribute is always set. spec_from_file_location() behavior fits better to load_source_path() use case, so future spec_from_file_location() changes are less likely to change the load_source_path() behavior.
The tests check also module.__spec__ attributes and module.__loader__ attributes: relative filename vs absolute filename, check loader.is_package(), check also module.__path__ list.

@brettcannon:

Then you probably should make sure the name doesn't have a . in it, otherwise you're also not making sure any submodules or subpackages end up on the parent.

Ok, I added a check to reject names which contain a dot character.

vstinner · 2023-06-19T08:18:44Z

My implementation is different than Python 3.11 imp.load_source() if the module is already cached in sys.modules: the imp function did not re-execute the module, whereas my implement always execute the module. Which behavior is the "correct" one?

Python 3.11 imp.load_source():

def load_source(name, pathname, file=None):
    loader = _LoadSourceCompatibility(name, pathname, file)
    spec = util.spec_from_file_location(name, pathname, loader=loader)
    if name in sys.modules:
        module = _exec(spec, sys.modules[name])
    else:
        module = _load(spec)
    # To allow reloading to potentially work, use a non-hacked loader which
    # won't rely on a now-closed file object.
    module.__loader__ = machinery.SourceFileLoader(name, pathname)
    module.__spec__.loader = module.__loader__
    return module

See the if name in sys.modules: test.

vstinner · 2023-06-19T13:30:37Z

My implementation is different than Python 3.11 imp.load_source() if the module is already cached in sys.modules: the imp function did not re-execute the module, whereas my implement always execute the module. Which behavior is the "correct" one?

A different design would be to no use sys.modules at all: always create a new module and always execute the module. Getting an module in sys.modules and caching the new module in sys.modules would be the responsibility of the caller.

If load_source_path() returns the module in sys.modules, the issue is that if the name is generic and/or reused to load Python modules under the same name, we get the "wrong module". For example, a buggy test loader using "test" module name, without del sys.modules["test"], would run the same tests.

Moreover, the function disallows the usage of dots in the module name, which makes the name collision even more likely.

vstinner · 2023-06-19T14:05:59Z

Examples of projects using imp.load_source():

coremltools: setup.py : sys.modules is not used, hardcoded "coremltools.version" name (with a dot!)
pyinvoke : sys.modules is not used, hardcoded "mod" module name, it was already replaced with SourceFileLoader("mod", path).load_module() before.

brettcannon · 2023-06-19T20:47:21Z

My implementation is different than Python 3.11 imp.load_source() if the module is already cached in sys.modules: the imp function did not re-execute the module, whereas my implement always execute the module. Which behavior is the "correct" one?

There is no "correct" semantics since this isn't a typical use case. It's trying to work outside of the import system to accomplish something while still using the import system to do some of the work. I think the key question is why would someone want this functionality? If it's to import some module in an odd spot, then it should go into sys.modules. If it's just to get some source file into a module object to call the code in there, then putting the module in sys.modules isn't necessary (and neither is the module name as that could be a dummy value). And if it's being done just to execute the code, then use exec() appropriately.

There's a reason why I haven't put something like this into importlib. 😉

vstinner · 2023-06-20T23:17:46Z

@brettcannon:

There is no "correct" semantics since this isn't a typical use case. It's trying to work outside of the import system to accomplish something while still using the import system to do some of the work. (...)

The more I dig into the details, the more I understand why it's hard to design a single function implementing "exclusive" use cases, especially the question about caching the module in sys.modules or not (3 cases):

don't use the cache and don't store in the cache
don't use the cache and store in the cache
use the cache and store in the cache

Morever, here we are already about replacing existing imp.load_source() calls. But also code written manually (using the imp and importlib modules).

For all these reasons, I give up on my attempt to add importlib.util.load_source_path() function. Instead, I propose to add a "recipe" explaining how to port code using the removed imp module in What's New in Python 3.12: PR #105951.

vstinner requested review from brettcannon, ericsnowcurrently, ncoghlan and warsaw as code owners June 13, 2023 22:55

bedevere-bot added the awaiting core review label Jun 13, 2023

bedevere-bot mentioned this pull request Jun 13, 2023

Improve the docs regarding the migration from imp to importlib #104212

Closed

vstinner mentioned this pull request Jun 13, 2023

imp.load_source docs removed from python3 docs...is this correct? #58756

Closed

brettcannon requested changes Jun 16, 2023

View reviewed changes

bedevere-bot added awaiting changes and removed awaiting core review labels Jun 16, 2023

vstinner force-pushed the importlib_load_source branch from c3a44b9 to fb03867 Compare June 16, 2023 10:41

brettcannon requested changes Jun 16, 2023

View reviewed changes

Lib/importlib/util.py Outdated Show resolved Hide resolved

arhadthedev reviewed Jun 17, 2023

View reviewed changes

vstinner added the needs backport to 3.12 bug and security fixes label Jun 19, 2023

pythongh-104212: Add importlib.util.load_source_path() function

f134e84

vstinner force-pushed the importlib_load_source branch from 8eec111 to f134e84 Compare June 19, 2023 08:12

vstinner changed the title ~~gh-104212: Add importlib.util.load_source() function~~ gh-104212: Add importlib.util.load_source_path() function Jun 19, 2023

vstinner mentioned this pull request Jun 19, 2023

gh-104212: Explain how to port imp code to importlib #105905

Merged

vstinner closed this Jun 20, 2023

vstinner deleted the importlib_load_source branch June 27, 2023 23:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-104212: Add importlib.util.load_source_path() function #105755

gh-104212: Add importlib.util.load_source_path() function #105755

vstinner commented Jun 13, 2023 •

edited by github-actions bot

Loading

vstinner commented Jun 13, 2023

vstinner commented Jun 13, 2023

vstinner commented Jun 14, 2023

brettcannon commented Jun 14, 2023

vstinner commented Jun 15, 2023 •

edited

Loading

brettcannon commented Jun 16, 2023

brettcannon left a comment

brettcannon Jun 16, 2023

brettcannon Jun 16, 2023

brettcannon Jun 16, 2023

bedevere-bot commented Jun 16, 2023

brettcannon commented Jun 16, 2023

vstinner commented Jun 16, 2023

vstinner commented Jun 16, 2023

vstinner commented Jun 16, 2023

brettcannon commented Jun 16, 2023

arhadthedev Jun 17, 2023

arhadthedev Jun 17, 2023

vstinner commented Jun 19, 2023

vstinner commented Jun 19, 2023 •

edited

Loading

vstinner commented Jun 19, 2023

vstinner commented Jun 19, 2023

vstinner commented Jun 19, 2023

brettcannon commented Jun 19, 2023

vstinner commented Jun 20, 2023

gh-104212: Add importlib.util.load_source_path() function #105755

gh-104212: Add importlib.util.load_source_path() function #105755

Conversation

vstinner commented Jun 13, 2023 • edited by github-actions bot Loading

vstinner commented Jun 13, 2023

vstinner commented Jun 13, 2023

vstinner commented Jun 14, 2023

brettcannon commented Jun 14, 2023

vstinner commented Jun 15, 2023 • edited Loading

brettcannon commented Jun 16, 2023

brettcannon left a comment

Choose a reason for hiding this comment

brettcannon Jun 16, 2023

Choose a reason for hiding this comment

brettcannon Jun 16, 2023

Choose a reason for hiding this comment

brettcannon Jun 16, 2023

Choose a reason for hiding this comment

bedevere-bot commented Jun 16, 2023

brettcannon commented Jun 16, 2023

vstinner commented Jun 16, 2023

vstinner commented Jun 16, 2023

vstinner commented Jun 16, 2023

brettcannon commented Jun 16, 2023

arhadthedev Jun 17, 2023

Choose a reason for hiding this comment

arhadthedev Jun 17, 2023

Choose a reason for hiding this comment

vstinner commented Jun 19, 2023

vstinner commented Jun 19, 2023 • edited Loading

vstinner commented Jun 19, 2023

vstinner commented Jun 19, 2023

vstinner commented Jun 19, 2023

brettcannon commented Jun 19, 2023

vstinner commented Jun 20, 2023

vstinner commented Jun 13, 2023 •

edited by github-actions bot

Loading

vstinner commented Jun 15, 2023 •

edited

Loading

vstinner commented Jun 19, 2023 •

edited

Loading